Model Selection

Lightweight Transformer

# Lightweight Transformer

SAUTE is a lightweight Transformer architecture with speaker perception ability, designed for effectively modeling multi-speaker dialogues.

Dialogue System

Transformers English

Terjman Nano V2.0

Terjman-Nano-v2.0 is a Transformer-based English-Moroccan dialect translation model with 77M parameters, optimized for high-quality and precise translation.

Machine Translation

Transformers Supports Multiple Languages

Spec-Vision-V1 is a lightweight, state-of-the-art open-source multimodal model designed for deep integration of visual and textual data, supporting a 128K context length.

Transformers Other

SVECTOR-CORPORATION

Mochi 1 Transformer 42

A distilled version of the genmoai mochi-1 model transformer, composed of 42 modules (original version has 48 modules), achieving lightweight through iterative removal of modules with the smallest MSE values

Text-to-Video English

Spam Mail Classifier

A text classification model fine-tuned based on microsoft/Multilingual-MiniLM-L12-H384, used to classify email subjects as spam (SPAM) or non-spam (NOSPAM).

Text Classification

Segformer B0 512x1024 City 160k

A lightweight semantic segmentation model based on the Segformer architecture, pre-trained on the Cityscapes dataset

Image Segmentation

Sapiens Depth 0.3b Torchscript

Sapiens is a family of vision transformers pre-trained on 300 million 1024 x 1024 resolution human images for depth estimation tasks.

3D Vision English

State-of-the-art sentence segmentation technology using a 3-layer Transformer architecture, supporting multilingual text segmentation.

Sequence Labeling

Transformers Supports Multiple Languages

segment-any-text

sat-3l is a model suitable for wtpsplit, adopting a 3-layer Transformer architecture to achieve state-of-the-art sentence segmentation.

Sequence Labeling

Transformers Supports Multiple Languages

segment-any-text

Meshgpt Preview

MeshGPT is a text-to-3D model based on autoencoders and Transformers, the world's first publicly available 3D model tokenizer.

Octo Small is a diffusion policy model for robot control, based on Transformer architecture, capable of predicting robot actions from visual inputs and language instructions.

Multimodal Fusion

Paraphrase MiniLM L6 V2 Finetune Summary

A sentence embedding model based on sentence-transformers that maps text to a 384-dimensional vector space, suitable for semantic search and text similarity calculation

Sts Distilcamembert Base

This is a French sentence embedding model based on DistilCamemBERT, capable of encoding sentences or paragraphs into 768-dimensional vectors for tasks such as sentence similarity computation.

Transformers French

Simple Stories 4M

Simple Stories is a series of small text generation models trained on the TinyStories dataset, focusing on generating children's stories.

Text Generation

Transformers English

Octo Small is a robot control model trained based on diffusion policy, capable of predicting 7-dimensional actions for the next 4 steps, suitable for multi-source robot datasets.

Multimodal Fusion

CED is a simple audio tagging model based on ViT-Transformer, achieving state-of-the-art performance on Audioset.

Audio Classification

T5 Translate Vietnamese Nom

A lightweight pre-trained model based on Transformer architecture, specifically designed for bidirectional translation between Vietnamese Nôm and Latin script

Machine Translation

Transformers Other

Mobilevitv2 1.0 Voc Deeplabv3

A semantic segmentation model based on the MobileViTv2 architecture, pre-trained on the PASCAL VOC dataset, supporting 512x512 resolution image processing

Image Segmentation

Segformer B0 Flair One

SegFormer is an efficient semantic segmentation model based on Transformer, with the b0 version being its lightweight implementation.

Image Segmentation

Internal.wav2vec2 Base Superb Ks Int8 Structured79

This model is a fine-tuned version of wav2vec2-base-ft-keyword-spotting on the superb dataset for audio classification tasks, optimized through quantization and structured pruning.

Audio Classification

Vit Small Patch16 224.dino

An image feature model based on Vision Transformer (ViT), trained using the self-supervised DINO method, suitable for image classification and feature extraction tasks.

Image Classification

T5 Small Vietnamese News

A lightweight pre-trained encoder-decoder Transformer model designed for Vietnamese news summarization

Text Generation

Transformers Other

T5 Small Wikilingua Vietnamese

State-of-the-art lightweight pretrained model for Vietnamese based on Transformer encoder-decoder architecture, specialized for text summarization tasks.

Text Generation

Transformers Other

Nat Mini In1k 224

NAT-Mini is a lightweight vision Transformer model based on neighborhood attention mechanism, designed for ImageNet image classification tasks

Image Classification

Transformers Other

T5-small is a pre-trained model based on the encoder-decoder architecture, capable of handling multiple tasks through a unified text-to-text format, supporting multilingual processing.

Large Language Model

Transformers Supports Multiple Languages

LeViT-128S is a vision Transformer model pretrained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference.

Image Classification

LeViT-384 is a vision Transformer model pre-trained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference speed.

Image Classification

HPD MiniLM F128

A sentence representation model for semantic retrieval compressed via homomorphic projection distillation, with only 23 million parameters and a model size of 87MB

Fnet Base Finetuned Cola

A text classification model fine-tuned on the GLUE COLA dataset based on google/fnet-base, used to evaluate the performance comparison between FNet and BERT architectures

Text Classification

Transformers English

Distil Eng Quora Sentence

This is a sentence embedding model based on sentence-transformers, capable of mapping sentences to a 768-dimensional vector space, suitable for tasks such as semantic similarity calculation and text clustering.

Xtremedistil L6 H384 Uncased

XtremeDistilTransformers is a knowledge-distilled lightweight Transformer model with task-agnostic properties, applicable to various natural language processing tasks.

Large Language Model English

Minilm L12 H384 Uncased

MiniLM is a compact and efficient pre-trained language model, compressed through deep self-attention distillation technology, suitable for language understanding and generation tasks.

Large Language Model

Deit Tiny Patch16 224

DeiT is an efficiently trained vision Transformer model, pretrained and fine-tuned on the ImageNet-1k dataset, suitable for image classification tasks.

Image Classification

Multilingual MiniLM L12 H384

MiniLM is a compact and efficient pre-trained language model that compresses Transformer models through deep self-attention distillation technology, supporting multilingual understanding and generation tasks.

Large Language Model Supports Multiple Languages

T5 Base Chinese

This is a truncated MT5-base model with vocabulary and word embeddings optimized for Chinese and English characters, suitable for Chinese-English text processing tasks.

Large Language Model

Xtremedistil L12 H384 Uncased

XtremeDistilTransformers is a task-agnostic Transformer model distilled through task transfer learning, creating a small universal model applicable to any task and language.

Large Language Model

Transformers English

Xtremedistil L6 H256 Uncased

XtremeDistilTransformers is a task-agnostic distilled Transformer model that utilizes task transfer learning techniques to train small general-purpose models suitable for various tasks and languages.

Large Language Model

Transformers English

Deit Small Patch16 224

DeiT is a more efficiently trained Vision Transformer model, pre-trained and fine-tuned on the ImageNet-1k dataset at 224x224 resolution, suitable for image classification tasks.

Image Classification

Distilroberta Base

DistilRoBERTa is a distilled version of the RoBERTa-base model with fewer parameters but faster speed, suitable for English text processing tasks.

Large Language Model English

Distilbert Base En Ur Cased

This is a distilled version of distilbert-base-multilingual-cased, specifically supporting English and Urdu while maintaining the original model's representation capabilities.

Large Language Model

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase